AN ADAPTIVE EULER-MARUYAMA SCHEME FOR SDES: 
CONVERGENCE AND STABILITY 
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Abstract. The understanding of adaptive algorithms for SDEs is an open area where 
many issues related to both convergence and stability (long time behaviour) of algorithms 
are unresolved. This paper considers a very simple adaptive algorithm, based on controlling 
only the drift component of a time-step. Both convergence and stability are studied. 

The primary issue in the convergence analysis is that the adaptive method does not 
necessarily drive the time-steps to zero with the user-input tolerance. This possibility must 
be quantified and shown to have low probability. 

The primary issue in the stability analysis is ergodicity. It is assumed that the noise is 
non-degenerate, so that the diffusion process is elliptic, and the drift is assumed to satisfy a 
coercivity condition. The SDE is then geometrically ergodic (averages converge to statistical 
equilibrium exponentially quickly). If the drift is not linearly bounded then explicit fixed 
time-step approximations, such as the Euler-Maruyama scheme, may fail to be ergodic. In 
this work, it is shown that the simple adaptive time-stepping strategy cures this problem. 
In addition to proving ergodicity, an exponential moment bound is also proved, generalizing 
a result known to hold for the SDE itself. 

Key Words: Stochastic Differential Equations, Adaptive Time-Discretization, Conver- 
gence, Stability, Ergodicity, Exponential Moment Bounds. 



1. Introduction 
In this paper, we study the numerical solution of the Ito SDE 
(1.1) dx(t) = f(x(t))dt + g(x(t))dW(t), x(0) = X 

by means of an adaptive time-stepping algorithm. Here x(t) e R m for each t and W(t) is a 
d— dimensional Brownian motion. Thus / : W m — > W 71 and g : W 71 — > M. mxd . For simplicity 
we assume that the initial condition is deterministic. Throughout | ■ | is used to denote either 
the Euclidean vector norm or the Frobenius (trace) matrix norm as appropriate. We assume 
throughout that /, g are C 2 . Further structural assumptions will be made where needed. The 
basic adaptive mechanism we study is detailed at the start of the next section. It is a simple 
adaptive algorithm, prototypical of a whole class of methods for the adaptive integration of 
SDEs. Our aim is twofold. First we show convergence, as the user-input tolerance r tends 
to zero; this is a non-trivial exercise because the adaptive strategy does not imply that the 



department of Mathematical Sciences, George Mason University, Fairfax, VA22030, USA. 

hlcimba@gmu.edu 

2 Department of Mathematics, Duke University, Durham, NC 27708, USA. 

j onm@math. duke . edu 

3 Mathematics Institute, Warwick University, Coventry, CV4 7AL, England. 

andrew.m.stuart@warwick.ac.uk 



1 



time-steps taken tend to zero with the tolerance everywhere in phase space. Secondly we 
show that the methods have a variety of desirable properties for the long-time integration of 
ergodic SDEs, including preservation of ergodicity and exponential moment bounds. 

The adaptive method controls the time-step of a forward Euler drift step so that it de- 
viates only slightly from a backward Euler step. This not only controls an estimate of the 
contribution to the time-stepping error from the drift step, but also allows the analysis of 
stability (large time) properties for implicit backward Euler methods to be employed in the 
explicit adaptive method. Numerical experiments suggest that both the convergence and 
stability analyses extend to a number of more sophisticated methods which control different 
error measures; some of these experiments are reported below. 

It is of interest to discuss our work in the context of a sequence of interesting papers which 
study the optimality of adaptive schemes for SDEs, using various different error measures 
[T2] IT31 ITU |21]. For many of these error measures, which are quite natural in practice, 
the asymptotically optimal adaptive schemes are based solely on the diffusion. This is 
essentially because it is the diffusion term which dominates the (lack of) regularity in paths 
and this regularity in turn dominates error measures. Why then, have we concentrated on 
methods which adapt only on the drift? The reason for this is that, as mentioned above, such 
methods are advantageous for long-time integration. In practice, we anticipate that error 
controls based on both drift and diffusion could combine the advantages of the asymptotically 
optimal schemes with the enhanced stability/ergodicity of schemes which control based on 
the drift. 

In order to prove a strong mean-square convergence result for this algorithm, it is first 
necessary to obtain a suitable upper bound on the sequence of timesteps used. These bounds 
mimic those used in the convergence proofs for adaptive ODE solvers [29] 1X7] ITB] and require 
that the numerical solution does not enter neighbourhoods of points where the local error 
estimate vanishes. (Requiring that these neighbourhoods are small excludes some simple 
drift vector fields, such as constants. In practice, we would anticipate controlling on both 
the drift and the diffusion, minimizing this issue). An essential part of the analysis is a 
proof that the contribution to the mean-square error from paths that violate this condition 
is suitably small. 

Adaptivity is widely used in the solution of ordinary differential equations (ODEs) in an 
attempt to optimize effort expended per unit of accuracy. The adaptation strategy can be 
viewed heuristically as a fixed time-step algorithm applied to a time re-scaled differential 
equation |B] and it is of interest to study convergence of the algorithms as the tolerance 
employed to control adaptation is reduced to zero [T7] . However adaptation also confers sta- 
bility on algorithms constructed from explicit time-integrators, resulting in better qualitative 
behavior than for fixed time-step counter-parts. This viewpoint was articulated explicitly in 
[27] and subsequently pursued in jT], [TU] and jHIj for example. In particular the reference 
[3T] studies the effect of time-discretization on dissipative structures such as those high- 
lighted in [7JES]- It is shown that certain adaptive strategies have the desirable property 
of constraining time-steps of explicit integrators so that the resulting solution update differs 
in a controlled way from an implicit method. Since many implicit methods have desirable 
stability properties (see [3] and [3U], Chapter 5) this viewpoint can be used to facilitate 
analysis of the stability of adaptive algorithms |31j . 

In pT] . stochastic differential equations (SDEs) with additive noise and vector fields satis- 
fying the dissipativity structures of [7J E2] are studied. There, and in [23 E3 E3] , it is shown 
that explicit time-integrators such as Euler-Maruyama may fail to be ergodic even when 
the underlying SDE is geometrically ergodic. The reason is that the (mean) dissipativity 
induced by the drift is lost under time-discretization. Since this is exactly the issue arising 
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for explicit integration of dissipative ODEs, and since this issue can be resolved in that 
context by means of adaptation, it is natural to study how such adaptive methods impact 
the ergodicity of explicit methods for SDEs. In recent years, the numerical solution of SDEs 
with gradient drift vector fields has been used as the proposal for an MCMC method for 
sampling from a prescribed density, known only up to a multiplicative constant - a technique 
referred to as Metropolis-adjusted Langevin algorithm j2]. In this context, it is very desir- 
able that the time discretization inherit ergodicity. The adaptive scheme proposed here is 
an approach to ensuring this. In this sense our work complements a number of recent papers 
concerned with constructing approximation schemes which are ergodic in situations where 
the standard fixed step Euler-Maruyama scheme fails to be: in |25 i a Metropolis-Hastings 
rejection criterion is used to enforce ergodicity; in (HJ [2E] local linearization is used; in [21] 
implicit methods are used. Although the adaptive method that we analyze here is proved to 
be convergent on finite time intervals, it would also be of interest to extend the work of Talay 
[32] . concerned with convergence proofs for invariant measure under time-discretization, to 
the adaptive time- step setting considered here. 

In Section [21 we introduce the adaptive algorithm, together with some notation. In Sec- 
tion [3] the finite time convergence result for the adaptive method is stated. The proof is 
given in Section 0] and proceeds by extending the fixed step proof given in JTj; the extension 
is non-trivial because the adaptivity does not force time-steps to zero with the tolerance in 
all parts of the phase space. In Section we state the main results of the paper on the 
stability of the adaptive method. All results are proved under the dissipativity condition 

(1.2) 3a, G (0,oo) : (f(x),x) < a - (3\x\ 2 Vx G M m , 

where (■, •) is the inner-product inducing the Euclidean norm, as well as a boundedness and 
invertibility condition on the diffusion matrix g. The results proven include ergodicity and 
an exponential moment bound; all mimic known results about the SDE itself under (II .2|) . 
Section E] starts with a number of a priori estimates for the adaptive scheme of Section [2] 
and proceeds to proofs of the stability stated in Section [5] Numerical results studying both 
convergence and ergodicity are presented in Sections EHEE Some concluding remarks and 
generalizations are given in Section ITU1 

2. Algorithm 

The adaptive algorithm for (jl.lj) is as follows: 

k n = G(x n , k n -±), k-i = K 

x n+ i = H{x n , A n ) + \^A n g(x n )rj n+1 , x = X, 

where A n = 2~ kn A max . Here 

H(x,t) = x + tf(x) 

and 

Q(x, l) = min{k G Z + : \f(H(x, 2~ k A max )) - f(x)\ < r & k > I - 1}. 

The random variables r]j G M. d form an i.i.d. sequence distributed as Af(0, 1). The parameter 
K defines the initial time-step and r > the tolerance. Note that the algorithm defines a 
Markov chain for (x n , k n _i) on R d x Z + . 
We may write 

x n x n -\- A n f{x n )^ 

x Tl+ i = xl + \/A n g(x n )r] n+1 . 
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(2.1) 



If K G Z + then k n G Z + and the error control enforces the condition 

A n < min{2A n _i, A max }, 
where A max is the fixed maximum time-step. Furthermore we have 

-/(*„) | <T. 

In the absence of noise, this implies that the difference between an Euler approximation 
at the next time-step, and an explicit second order approximation, is of size 0(A n r). In 
the presence of noise, it imposes a similar restriction on the means. As mentioned in the 
introduction, in practice we would anticipate combining this drift error control with others 
tuned to the diffusion. 

2.1. Notation. The most important notation conceptually is concerned with making rela- 
tionships between the numerical approximations at discrete steps, and the true solution at 
certain points in time. To do this we define T n to be the sigma-algebra generated by n steps 
of the Markov chain for (x n , fc n _i). Let 

t n = t n -\ + A n _i, to = 0, 

5 > and define the stopping times Nj by Nq — and, for j > 1, 

N j = mi{n:t n >5 + t N }. 

J n>0 J 

Where the dependence on 5 is important we will write Nj(6). It is natural to examine the 
approximate process at these stopping times since they are spaced approximately at fixed 
times in the time variable t. Theorem |21 in Section El shows that these stopping times are 
almost surely finite, under the dissipativity condition (jl.2j) . Notice that 

§-■=§< t N . - t Nj _ 1 <S + A m ax ■= S + . 

When considering strong convergence results it is necessary to interpret ^/A n r] n+ i in the 
adaptive algorithm as the Brownian increment W(t n+ i) — W(t n ). 
We let 

Vj = x Nj+1 and lj = k Nj . 

The Markov chain for {yj, lj} will be important in our study of long time behaviour and we 
will prove that it is ergodic. Let Qj = Tn^ the filtration of events up to the j th stopping 
time. 

It is convenient to define two continuous time interpolants of the numerical solution. We 

set 

(2.2) X(t)=x n , t G [t n ,t n+1 ), 

(2.3) X(t)=X+ [ f(X(s))ds+ [ g(X(s))dW(s). 

Jo Jo 

Hence, for t G [t n ,t n+ i) 

(2.4) X(t) =x n + (t- t n )f(x n ) + g(x n )[W(t) - W{t n )\ 

(2.5) = (1 - a n (t))x n + a n (t)x* n + g(x n ) [W(t) - W(t n )\ 

for a n (t) = (t - t n )/(t n+1 - t n ) G [0, 1). 

It is sometimes important to know the smallest step-size beneath which the error control 
is always satisfied, at a given point x. Hence we define 

k*(x) =min{A; G Z+ : \f(H(x,2- l A max )) - f(x)\ <r\/l>k} and k*(B) = supk*(x), 

x&B 
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noting that, by continuity of /, k*(B) is finite if B is bounded. 

Because of the boundedness of g we deduce that there are functions o~(x) and constants 
a, a > such that, for rj distributed as rji and independent of x, 

E\g{x)r]\ 2 := a 2 (x) < a 2 , and E| \g(x)r]\ 2 - a 2 (x)\ 2 \ < aa 4 . 

The following definitions will be useful: 



a = a H — r, — r, 3 n = = — . 



7 = l+/9A f 



We will always assume that r is chosen small enough so that > 0. The constants 7 is 
chosen so that 

(1 + t)- 1 < (1 - 7 -*) < e-T"* V* G [0, 2M™,]. 

3. Convergence Result 
We start by discussing the error control mechanism. We define Fi(u) by 

F 1 (u) = df(u)f(u), 

the function F 2 (u, h) by 

F 2 (u, h) := h- 1 (/(« + />/(«)) - /(«) - ^(u)) 

and /i) by 

EK/ i ) = /( M + / i /( M ))-/H. 

Now, since / G C 2 , Taylor series expansion gives 

(3.6) E(u,h) = h[F 1 (u) + hF 2 (u,h)] 

where F\,F 2 are defined above. Note that the error control forces a time-step so that the 
norm of E(x n , A n ) is of order 0(t). Estimating the implications of this for the time-step A n 
forms the heart of the convergence proof below. 

In order to state the assumptions required for the convergence result we define, for R, e > 0, 
the sets 

tf(e) = {116 R m : |Fi(u)| < e}, B R = {ue M m : \u\ < R} and B R , e = B R \ *(e) 
and introduce the constant K R = sup u6BH)hg [ ,At max ] l^iu, h)\. Now define the following: 



or 



= inf{t>0: \X(t)\ >R}, PR 
= infO>0:|F 1 (X(t))| <e}, p e 
= a R Aa e , p R , e 



= mi{t > : \x(t)\ > R}, 6 R 

= inf{t > : |Fi(a:(*))| < 2e}, 9 t 
= p R Ap e , 9 Rje 



= cr R A p R 
= a e A p € 
= 9 R A9 e . 



The first assumption is a local Lipschitz condition on the drift and diffusion co-efficients, 
together with moment bounds on the true and numerical solutions. 

Assumption 3.1. For each R > there exists a constant C R , depending only on R, such 
that 



(3.7) 



\f(a) - f{b)\ 2 V \g(a) - g{b)\ 2 < C R \a - b\ 2 , Va, b G M m with \a\ V \b\ < R. 



For some p > 2 there is a constant A, uniform in r — > 0, such that 



(3.8) 



E 



sup \X(t)\ p 


VE 


sup \x(t) \ p 


0<t<T 




0<t<T 



< A. 
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Note that inequality ()3.7|) is a local Lipschitz assumption which will be satisfied for any 
/ and g in C 2 . The inequality ()3.8|) states that the p th moments of the exact and numerical 
solution are bounded for some p > 2. Theorem 0] proves ()3.8|) for the numerical interpolant, 
under natural assumption on / and g (see Assumption 15. lj) . Under the same assumptions, 
such a bound is known to hold for x(t); see [T$] . 

We clearly also need an assumption on the local error estimate since if, for example, the 
drift term f(u) were constant then E(u,h) = and the stepsize would, through doubling, 
reach A max , no matter how small r is, and convergence cannot occur as r — > 0. Because 
the function Fi(u) maps M m into itself, the following assumption on the zeros of Fi(u) will 
hold for generic drift functions / which are non-constant on any open sets; it does exclude, 
however, the case of constant drift. Furthermore the assumption on the hitting time rules 
out dimension m = 1. 

Assumption 3.2. Define 

£(e, R) = d H {V(2e) c H B R , *(e) n B R }. 

For any given R > we assume that lie, R) > for all sufficiently small e > 0, and that 
£(e, R) — > as e — > 0. Furthermore, the hitting time p e satisfies, for any X ^ ^(0), 

p t — ► oo as e — > a.s. 

Here dn denotes Hausdorff distance. The preceding assumption requires that the contours 
defining the boundary of \l/(e) are strictly nested as e increases, and bounded. This enables 
us to show that the probability of (x(t),X(t)) E (^(2e) c n B R ) x (*(e) n B R ) is small, a key 
ingredient in the proof. 

We now state the strong convergence of the adaptive numerical method, using the continuous- 
time interpolant X(t). Note that we do not assume A max — > for this theorem. Hence the 
non-standard part of the proof comes from estimating the contribution to the error from 
regions of phase space where the time-step is not necessarily small as r — > 0. 

Theorem 1. Assume thatX ^ ^(0). Let Assumptions ^. 1\ and \H. 6 J hold. Then, there is A c (r) 
such that, for all A_i < A c (r) and any T > 0, the numerical solution with continuous-time 
extension X(t) satisfies 



E 



sup \X(t) - x(t)p 

0<t<T 



as t — > 0. 



4. Proof of Convergence Result 

The primary technical difficulty to address in convergence proofs for adaptive methods is 
to relate the time-step to the tolerance r. Roughly speaking the formula (J3.6)) shows that, 
provided Fi(u) ^ 0, the error control will imply A n = 0{r). We now make this precise. We 
provide an upper bound on the timestep sequence of numerical solutions that remain within 
Bji }e , for sufficiently small r. For given R, e > we define the quantities 

ha, = an d T R,< 



6K R ^ 12K R 

Lemma 4.1. For any R,e > 0, if {x n }^ =Q C B R e , r < r Rt6 and A_i < — then 

— 2t 

(4.9) A n < mm{h R>e , — } V < n < N. 



Proof. The error control implies 

\E{x n ,A n )\ = A n \F 1 (x n ) + A n F 2 (x n ,A n )\ <r. 

Note that 

A_x < 2r R>e /e = h Rt6 . 

We first proceed by contradiction to prove A n < h Re V < n < N . Let < m < N be the 
first integer such that A m > h R ^. Then, since there is a maximum timestep ratio of 2, we 
have 



A m e 



QK R "ZK R S 



A m \F 2 (x m , A m )\ < 



\E(x m ,A m ) \ > A m (e-e/2) > 



eh 



B.c 



TR,e > T. 



2 12K R 

Thus A m is not an acceptable timestep, contradicting our original assumption. The first 
result follows. The proof of the bound on the timestep in ()4.9J) now follows immediately 
since 

T 



A n < 



t 2t 
< — < — V0<n< N. 



Fi(x n ) + A n F 2 (x n , A n )\ ~ (e-e/2) ~ e 



□ 



Proof of Theorem 1 We denote the error by 

e(t) :=X(t)-x(t). 
Recall the Young inequality: for r^ 1 + q^ 1 = 1 



ab < -a r + -^b\ Va, 6, 5 > 0. 
r qo q/r 



We thus have for any 5 > 



E 



sup \e(t)[ 

0<t<T 



= E 

< E 



sup |e(t)| 2 l{^, £ >T} 

0<t<T 



+ E 



sup \e(t)\ 2 l{e R>e < T} 

0<t<T 



(4.10) 
Now 

But 

whilst 



sup |e(t A^, e )| 2 l{^, £ >T} 

P<t<T 
1 - " 



25^ 
+ — E 

V 



sup |e(£)| p 

0<t<T 



P (Or,, <T)= ¥{9 r <T} + F{6 e < T, 6 R > T}. 
<T}< P{a R < T} + F{p R < T} 
nt>e <T,9 R >T}< F{p e <T} + F{6 e <T,6 R > T, p e > T}. 



Thus we have 

P (0 R , e <T)< ¥(a R < T) + ¥(p R < T) 



<T)+F{9 e <T,9 R >T,p e >T}. 



To control the last term notice that whenever 8 e < T, 9 R > T and p e > T we know that 
| e(cx e ) | > £(e,R). Hence we have 

< T,8 R >T,p e >T}< F{\e(TA9 R , e )\ > £(e, R)} < E|e(T A R , e )\ 2 /e(e, R) 2 . 
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Combining the two preceding inequalities gives 

P(0r, £ <T) <F(a R <T) +¥( PR <T) +¥( Pe <T) +e( sup \e(t A 6 R , e )\ 2 )/£(e, Rf. 

v 0<t<T 

By Markov's inequality 



so that 
(4.11) 

Furthermore, 
(4.12) 



[ctr<T},¥{ Pr <T}<^. 



2A / 
P(0*, £ < T) < — + P(p e < T) + E sup \e(t A 6 R , 

W V 0<T 



)| 2 )/^Rf 



E 



sup |e(t)| p 


< 2 P " X E 


0<t<T 





sup (|x(t)r+|x(t)r) 

0<t<T 



< 2 P A. 



Using (|4.11|) . ()4.12j) in ()4.10|) gives, for e sufficiently small 

p- 2 



E 



sup \e(t)\' 

0<t<T 



< 



(4.13) 



1 ' p<5 2 /(p- 2 )£(e, i?) 5 

2 P+1 <M (p - 2) 
+ + 



p p S 2 /(p- 2 ) 



E 

2A 
BP 



sup |e(tA^ )( 

0<t<T 



+ 



< T} 



Take any k > 0. To complete the proof we choose 5 sufficiently small so that the second 
term on the right hand side of ()4.13|) is bounded by k/A and then R and e sufficiently 
large/small so that the third and fourth terms are bounded by k/A. Now reduce r so that 
Lemma 14.11 applies. Then, by further reduction of r in Lemma 14.21 we upper-bound the 
first term by k/4. fLemma 14.21 calculates the error conditioned on the true and numerical 
solutions staying within a ball of radius R, and away from small sets where the error control 
mechanism breaks down. With this conditioning it follows from Lemma 14.11 that we have 
A n = 0(t), which is the essence of why Lemma [4.21 holds.) 

Consequently we have 



E 



sup \X(t) 

0<t<T 



x(t)\ l 



< K 



and since k is arbitrary the required result follows. □ 

In the following, C is a universal constant independent of T, R, e, 5 and r. Likewise C R is 
a universal constant depending upon R, but independent of T, e, 5 and r, C R) t is a universal 
constant depending upon R and T, but independent of e, 5 and r and C RjE} t and so forth 
are defined similarly. The actual values of these constants may change from one occurrence 
to the next. 

Lemma 4.2. Assume that X ^ ^(0) and that r is sufficiently small for the conditions of 
Lemma \J~l\ to hold. Then the continuous interpolant of the numerical method, X(t), satisfies 
the following error bound: 



E 



sup \X(t A 8 Rje ) - x(t A 9 R;I 

0<t<T 



< Cr,€,tt. 



Proof. Using 



x(t A 9 Rje ) :=X + 



tA6 R . 



f(x(s))ds + 
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tA0 R . 



g(x(s))dW(s), 



equation ()2.3j) and Cauchy-Schwartz, we have that x '■= \X(t A 9 Ry6 ) — x(t A Rt6 )\ 2 , satisfies 



X 



< 2 



T 



f(X(s))-f(x(s)))ds + 
\f(X(s))-f(x(s))\ 2 ds + 



g(X( S ))-g{x(s)))dW(s 



the 



R.c 



g(X(s))-g(x(s)))dW(s 



Let 



E(s) :-- 



sup \X(t A 9 Rje ) — x(t A 9 Rt , 

0<t<s 



Then, from the local Lipschitz condition (|3.7|) and the Doob-Kolmogorov Martingale in- 
equality [26], we have for any t* < T 



ME(t*) < 2C R (T + 4:)E 
< 4C R (T + A)E 
(4.14) < AC R (T + 4) 



t*A$ f 



t*A6r.. 



\X(s)-x(s)\ 2 ds 



\X(s)-X(s)\ 2 + \X(s)-x(s) 



ds 



E 



t*A6 t 



\X(s) -X(s)\ 2 ds + / EE(s)ds 



Given s G [0, T A #r j£ ), let fc s be the integer for which s G [tfc s , t^+i)- Notice that t ka is 
a stopping time because A ks is a deterministic function of Ak s -i). We now bound 
the right hand side in (|4.14|) . From the local Lipschitz condition (|3.7|) . a straightforward 
calculation shows that 

\X(s) - X(s)\ 2 < C R (\x k f + l)(Al + \W(s) - W(t ks )\ 2 )- 
Now, for s < 0R e , using Lemma I4~TI 

\W{s) - W(t ks )\ 2 = s -t ks + 2 [ S [W{1) - W(t ks )]dW(l) 



2r, 



<( s -t fe J[l + /( s )]<-[l + /( s )]. 



Here 



I(s) 



is - ti 



[W(l)-W(t k3 ))dW(l) 



Let 7i s denote the a— algebra of Brownian paths up to time t kg . Then, conditioned on 7i s 
we have 

(4.15) EJ(s) < V2. 

Thus, using Lemma 14.11 (|3.8|) and the Lyapunov inequality [T3] , 



E 



|X(s) -X(s)|'rfs < E 



C fl (|s fc .| 2 + l)(4r 2 /e 2 + \W(s) - W(t ks )\ 2 )ds 

< C R , e rE / (l + \x k f)(l+I(s))ds 

Jo 

< C r ^ t (A 2/p+1 )t. 



To obtain the last line we condition on 7i s so that \x ks \ 2 and I(s) are independent; we then 
use (|4.15jl and the assumed moment bound. 
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In (I4.14J1 . we then have by Lemma f4. II 



I 



■t* 



E£(f) < C r ^ t t + AC r ^ t 



EE(s)ds. 



Applying the Gronwall inequality we obtain 



E sup (X(tA9 Rje )-x(tAe R>e )) 2 <C{R,e,T)r. 



0<t<T 



□ 



5. Stability Results 



For all of our stability results, in this and the following sections, we make the assumption 
that (|1.2j) holds, together with some conditions on the diffusion matrix. To be explicit we 
make 

Assumption 5.1. There exists finite positive a, (3 such that 



where (•, •) is the inner-product inducing the Euclidean norm \ ■ \ . Furthermore m = d, g is 
globally bounded and is globally invertible. 

The assumption is made, without explicit statement, for the remainder of the paper. We 
also assume, without explicit statement, that r < 2(3 so that (3 > 0. Finally we assume, also 
without explicit statement, that there is at least one point y € lR m such that 



This may implicitly force upper bounds on r and A max , although neither is necessarily 
restricted by this assumption. The existence of such a y is implied by Assumption 15. 1| 
which rules out / being identically constant. Then there exists y for which the function 



is non-zero in a neighbourhood of h — and (j5.16|) must hold, possibly after enforcing 
bounds on r and A max . 

Under Assumption 15.11 the solution of (jl.lj) exists for all t > jHJ HH] and the equation is 
geometrically ergodic [§1 123 EI] • The first stability result ensures that the method will not 
decrease its stepsize in such a way that it is unable to reach arbitrary finite times. 

Theorem 2. The stopping times Nj are almost surely finite. 

The next result is the main ergodic result of the paper. It ensures that the adaptive 
method has an attracting statistical steady state. Letting W' 1 denote the expectation under 
the Markov chain started at xq = y, k-\ = I, we have the following result. (Recall 6 occurring 
in the definition of stopping times Nj . ) 

Theorem 3. Assume that 5 > 5A max . The Markov chain {yj,lj} = {x^+i, k^} has a 
unique invariant measure n. Furthermore, if h : M m x Z + — > R is measurable and 



{f(x),x) <a- [3\x\ 2 WxeR m , 



(5.16) 



k*(y) = g(y,l). 



\f(y + hf(y))-f(y)\ 




The final result gives a moment bound on the continuous time interpolants of the numerical 
solution, mimicking that for the SDE itself. 
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Theorem 4. There exists a A > and a c > so that 

Eexp(A sup ||X(t)|| 2 ) < exp(A|X| 2 + cT) 
te[o,T] 

Eexp(A sup \\X(t) f) <exp(A|X| 2 + cT). 
te[o,T] 

6. Proof of Stability Results 

We start with a number of estimates which will be needed to prove the main results. It is 
useful to define 

£ n+ i = 2 v / A^(x*,5f(x n )?7 n+ i), | n+1 = A„[|5f(x„)?7 n+ i| 2 - <x 2 (x„)], 

n— 1 7i— 1 

M n = Mn = ^0+1. 

Observe that (a;*, g(x n )r] n+ i) is a Gaussian random variable conditioned on the values of x n 
and x* n . Hence the last two expressions are Martingales satisfying the assumptions of Lemma 
IA. II from the appendix. Also notice that the quadratic variations satisfy 

n— 1 n— 1 

(6.1) (M) n < J^4A,,-|x*|V and (M) n < ^ aA 2 a 4 . 

i=o i=o 

We start with a straightforward lemma. 
Lemma 6.1. The sequences {x* n } and {x n } satisfy 

K\ 2 < |x n | 2 + 2A n [a-/3|<| 2 ], 

\Xn+l\ 2 < Pn\x n \ 2 + A„[2<5 + 0" 2 ] + £ n+i + | n+ i- 

Hence 

n—l n—1 

(M) n <4o- 2 J2\ x j\ 2A j + 8a2 ®J2 A2 r 

j=0 j=Q 

PROOF: Taking the inner product of the equation 
with x* n and using the fact that the error control implies 

|/«)-/WI<T, 

a straightforward calculation from |3T], using (|1.2|) . gives the first result. To get the second 
simply square the expression (|2.1j) for x n+ i and use the first, noting that (3 n < 1. For the 
third use the first in the bound (j6.1|) for (M) n . □ 

Lemma 6.2. We have 

K+i| 2 < \X\ 2 + C t n+1 + M n+1 - 1 A(M) n+1 + M n+1 - 2(M) n+1 . 

_ (J 

where Cq = [25 + 4a 4 A max ] . Furthermore 

Pfsup{|x n | 2 - C t n } > |X| 2 + A) < 2exp ( - BA) 

where B is a positive constant depending only on a and f3. 

n 



Proof: Squaring the expression for x n+ i in (j2.1j) . bounding |x*| 2 by the first inequality 
in Lemma f6. II and summing gives 

|^n+i| 2 — \X\ 2 + Cot n+ i + S n+ i + S n+ i 

where 

n 

S n+1 = M n+1 - 2/3^ |4| 2 A fc , S n+1 = M n+1 - 4(7 4 A 

maxtn+l 

and M„, M n before. Using (jfi.lj) . one obtains 

maxtn+l ■ 

and 

(M) n+ i <4a 2 ^A fe |x^| 2 . 

fc=0 

Combining all of this produces, 

S n+ i < M n+1 - lA(M) n+1 and S n+1 < M n+l - 2(M) n+1 . 

The probabilistic estimate follows from the exponential martingale estimates from the Ap- 
pendix. □ 

Corollary 6.3. Then there exists a universal A > and C\ > so that for any stopping 
time N with < tjy < almost surely, for some fixed number £* ; one has 

Eexp(A sup |x n | 2 ) < Ciexp(A|X| 2 + AC t*) . 

0<n<N 

Proof. The result follows from Lemma [6.21 and the observation that 

Pf sup \x n \ 2 > \X\ 2 + C U + a)< p(sup{|x„| 2 - C t„} > \X\ 2 + A) . 

□ 

Lemma 6.4. The Markov chain {xN j }j&+ satisfies the Foster- Lyapunov drift condition 
E{\x Nj+1 \ 2 \F Nj } < exp(-2 7 -^r)|x^| 2 + exp(2^ + )[2a + (7 2 ]5 + . 

That is 

H\Vj+i?\Qj} <exp(-2 7 ^r)| % f + exp(2^ + )[2a + a 2 ](5 + . 

PROOF: Note that (1 + < e~' y ~ x for all x e [0, 2A max 0\. From Lemma l57tT we have 
where K n := A n [2a + a 2 ]. Defining 

7j = (nCoA" 1 ) 

we obtain 

E (7jv j+ i I aty +i 1 2 1 Fty ) < In, I x N] \ 2 + E ( 7m | ^ ) . 
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Now 

Nj+i-l N j+1 -1 

(6.2) A j <5 + A max = 5 + and ^ &j>8 = $~- 

l=Nj l=Nj 

Straightforward calculations show that 

7tv j+1 > exp(2^7"5)7 JVj 

and 

Ji+i < exp(2(35 + )j N .. 

Hence 

N j+1 -1 

E{\x Nj+1 \ 2 \F Nj } <exp(-2 7 -^r)|x^| 2 + exp(2^ + )E| ^ 
and for the required result we need to bound the last term. By ()6.2|) we have 

N J+1 -1 

E( ^ A,|^) < <5 + A ma:c = 5+ 

and we obtain 

e( ^ Ki\F Nj ) < [2a + a 2 }5 + . 

l=N 3 

This gives the desired bound. □ 

We now proceed to prove the ergodicity and moment bound. We prove geometric ergod- 
icity of the Markov chain {yj, lj} by using the approach highlighted in [231 • In particular we 
use a slight modification of Theorem 2.5 in [21]. Inspection of the proof in the Appendix 
of that paper shows that, provided an invariant probability measure exists, and this follows 
from Lemma 16.41 the set C in the minorization condition need not be compact: it simply 
needs to be a set to which return times have exponentially decaying tails. 

Let 

P(y,l,A) = ¥((y 1 ,l 1 )eA\(y ,l Q ) = (y,l)) 

where 

A e B(R m ) ® B(Z + ), (y, i)el m xZ + . 

We write A = (Ay, A t ) with A y G B(R m ) and A\ G B(Z+). 

The minorization condition that we use, generalizing that in Lemma 2.5 of [21], is now 
proved: 1 

Lemma 6.5. Let C be compact. For 5 > 5A max there is ( > 0, y G R m and e > such that 
P(y, I, A) > £u(A) VA G B(W m ) x B(Z + ), (y, I) G C x Z+ 

where 

u(A) = Leb[B(y,e) n A,} • G A,}. 



Note that although C is compact in the following, C x Z + is not. 
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Proof: Let M = Ni(2A max ) and N = N X (S). Recall the definition (pHKJ) of y. Since 
tu < 3A ma:r almost surely, setting r 2 = R 2 + B 2 Corollary 16 . 31 implies we can choose positive 
B and R sufficiently large so that 

P{ sup \x Tl \ 2 <r 2 \>\ 

0<n<M > 2 

and y G -8(0, r), and C C B(0, R). Label this event, with probability in excess of |, by E\. 
If £"i occurs then there exists / G {0, . . . , M} such that ki < fe*(S(0,r)). This follows by 
contradiction, since otherwise Aj = 2 J+1 A_i for j G {0, . . . , M} and 

M-1 M-1 oo 

^m ^ ^ Aj ^ A max ^ ^ 2 j ' ^ A rnax 2 A max . 

3=0 j=0 k=l 

However t M > 2A max , a contradiction. Once kj < k*(B(0, r)) it follows that fc n < fc*(S(0, r)) 

for n G {I M} as a consequence of the step-size selection mechanism. 

Assume that E 1 has occurred. By choice of e sufficiently small, B(y, e) C B(0, r). We now 
choose the rjj for j G {M, . . . ,N — 1} to ensure the event E 2 namely that 

XjEB(y,e), M +1 <j < N. 

It is possible to ensure that the event has probability pi > 0, uniformly for X G C and 
A; G Z + . The fact that Im G 5(0, r) gives uniformity in X G C. We prove an upper bound 
on the number of steps after M to get probability independent of k_i G Z + . To do this 
notice that k n < k*(B(0, r)) now for n G {j, . . . , X}, again as a consequence of the step-size 
mechanism. In fact k^ = k*(B(y,e)) = k*(y). This follows because an argument analogous 
to that above proves that there is I G {M + 1, . . . , N} for which k n < k*(B(y, e)) = k*(y) 
for n G {/,..., N}. Now k*(B(y,e)) = k*(y), by continuity of / and possibly by further 
reduction of e. Since kj < k*(y) = Q(y, 1) is not possible, it follows that k N = k*(y). 

If Ei and Ei both occur then, for some 7 > 0, the probability that y\ = x^ts) G A y is 
bounded below by r yLeb{A y n B(y, e)}, for some 7 > 0, because 

£iv = x* N -! + A N _ 1 g(x* N _ l )T] N , 

xn-i is in a compact set and g is invertible. The fact that rjj are i.i.d Gaussian gives the 
required lower bound in terms of Lebesgue measure. The final result follows with ( = 7Pi/2. 
□ 

With this minorization condition in hand, we turn to the proof of ergodicity. 

Proof of Theorem O The existence of an invariant measure n follows from the Foster- 
Lyapunov drift condition of Lemma 16.41 which gives tightness of the Krylov-Bogoljubov 
measures. Lemma 16.41 shows that the chain {yj,lj} repeatedly returns to the set C x Z + 
and that the return times have exponentially decaying tails. This generalizes Assumption 
2.2 in [21]. Lemma 16.51 gives a minorization condition enabling a coupling. Together these 
two results give Theorem El by applying a straightforward modification of the arguments in 
Appendix A of [2T]. □ 

Proof Theorem 0] We define the stopping time N by 

N = inf {n : t n > T\ 

n>0 

noting that 

T < t N < T + A max 
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Notice that (|2.5jl implies that 



sup |X(t)| 2 <C 1+ sup \x k \ 2 + 



sup 

(t-s)e[0,Amad,se[O,T] 



\W(t)-W(s)f 



0<t<t N 0<k<N 



Here we have used the fact that, by Lemma lfi~T| 

\x* k \ 2 < \x k \ 2 + 2A max a. 

From this relationship between the supremum of moments of X{t) and X(t), and from the 
properties of increments of Brownian motion, it follows that, to prove Theorem EJ it suffices 
to bound 



for some A > 0. However this follows from the fact that tjy < T + A max and Corollary 16.31 



We now provide some numerical experiments to complement the analysis of the previous 
sections. We begin, in this section, by demonstrating the importance of Assumption 13.21 in 
ensuring pathwise convergence. In the next section we discuss an abstraction of the method 
presented and studied in detail in this paper. Section|H]then shows how this abstraction leads 
to a variant of the method discussed here, tailored to the study of damped-driven Hamil- 
tonian systems. We provide numerical experiments showing the efficiency of the methods at 
capturing the system's invariant measure. 

In the convergence analysis, we made Assumption 13.21 the second part of which was to 
assume that the hitting time of small neighbourhoods of the set ^(0) is large with high 
probability. We now illustrate that this is not simply a technical assumption. We study the 
test problem 



where W is a real valued scalar Brownian motion. The set ^(O) comprises the points where 
f{y) '■= V ~ V 3 satisfies f(y) = and f'(y) = 0, that is the points ±1,0, ±4|. Since the 
problem is one dimensional the hitting time to neighbourhoods of these points is not small. 

For contrast we apply the algorithm to the systems in two and three dimensions found by 
making identical copies of the equation (|7.1|) in the extra dimensions with each dimension 
driven by an independent noise. Thus the set ^(O) comprises the tensor product of the 
set ±1,0, ±4^ m the appropriate number of dimensions. Small neighbourhoods of this set 
do have large hitting time, with high probability. To illustrate the effect of this difference 
between hitting times we show, in Figure ITU the average time step taken at a given location 
in space for (the first component of) y. Notice that in one dimension the algorithm allows 
quite large average steps in the neighbourhood of the points ±1, 0, ±^=. This does not happen 
in dimensions two and three because the probability that the other components of y is also 
near to the set ±1, 0, ±^j at the same time is very small. The effect of this large choice of 
time steps in one dimension is apparent in the empirical densities for (the first component 
of) y which are also shown in Figure I7~T| these are generated by binning two hundred paths 
of the SDE ()7.1|) over two hundred time units. It is important to realize that, although the 
algorithm in one dimension makes a very poor approximation of the empirical density, this 
occurs only because of a relatively small number of poorly chosen time-steps. Figure 17.21 
shows a histogram of the timesteps (k n values) taken in one, two and three dimensions. The 
plots are nearly identical, except that in one dimension the algorithm allows the method to 
take a small number of larger steps with k n = 4. 



Eexp(A sup \xk\ 2 ) 

0<k<N 



7. Numerical Experiments: Pathwise Convergence 



(7.1) 



dy = y-y 3 + dW, 
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FIGURE 7.1. Effect of the "bad set" $?(e) in different dimensions. On the right. 
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log 2 ( step size 



Figure 7.2. Gradient Problem - Timestep Historgram 

8. Generalizations of the Method 

The method given in (|2.1|) can be seen as a simple instance of a general class of methods 
based on comparing, with some error metric, one time step given by two methods of the 
form 

(8.2) x n+ i = x n + F(x n , A n+1 )A n+1 + G(x n , A n+1 )VA n+1 r] n 

Xn+1 Xn F\X n: 

)A n+ i + G(x n , A n+ i)v / A n+ i?7 n , 

where F, F, G, G are deterministic functions. The method in (j2.1j) was based on comparing 
the pair of explicit methods given by 

(8.3) x n+1 = x* + VA n+1 g(x n )r] n 

x n+ i = x n + ^X n+ ig(x k n )r] n 

where 

X n =Xn + -A n+ i[f(x n ) + f(x* n )}. 

In (12.1 J) , closeness was measured by the difference, divided by the step size, between the 
conditional expectations of one time step of the two different methods; this gives 

(8.4) -^\Ex n+1 -Ex n+1 \ = 2\F(x n ,A n+1 )-F(x n ,A n+1 )\ = \f(x n )-f{x* n )\ . 

From this point of view, it is clear that the method discussed thus far is one of a large 
family of methods. Depending on the setting, one might want to compare methods other 
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that the simple Euler methods used thus far. Also one can consider different error measures. 
In the next section, we study a damped-driven Hamiltonian problem and use ideas from 
symplectic integration to design an appropriate method. In the discussion at the end of the 
article, we return to the question of different error measures. 



9. Numerical Experiments: Long Time Simulations 

In this section, we demonstrate that the ideas established for the rather specific adaptive 
scheme studied, and for the particular hypotheses on the drift and diffusion, extend to a 
wider class of SDEs and adaptive methods. 

As a test problem we consider the Langevin equation 

(9.5) dq —p dt 

dp = - [5(q)p + <&'(<?)] dt + g{q)dW . 

where 25(q) = g 2 (q), 

4(,) = i(WV and «,(,) = ^±11. 

The preceding theory does not apply to this system since it is not uniformly elliptic; further- 
more it fails to satisfy (jl.2J) . However it does satisfy a Foster-Lyapunov drift condition and 
since it is hypoelliptic the equation itself can be proven geometrically ergodic [21]. In j2l], it 
was shown that the implicit Euler scheme was ergodic when applied to (|9.5|) . and a similar 
analysis would apply to a variety of implicit methods. Since the adaptive schemes we study 
in this section enforce closeness to such implicit methods, we believe that analysis similar 
to that in the previous section will extend to this Langevin equation and to the adaptive 
numerical methods studied here. 

We will compare two different methods based on different choices of the stepping method. 
The first is the Euler based scheme given in (j2.1|) . The second is the following scheme: 

(9.6) q n+1 =q* 

Pn+l =Pn + 9((ln)\^lVn+l 
Qn+1 =Qn+ = An 



where 




P n+1 =Pn~ ^ 7T - ^n-0\ — -I — - W±n+g\ — - V ^nVn+1 



Qn =q n +Pn^n, 

Pn =Pn - &(Qn) A n ~ S(q*) Pn A n . 

Once again we will use comparisons between the two updates (with and without bars) to 
control the error. As before, we control on the difference in the expected step. The point of 
the particular form used here is that, in the absence of noise and damping, the adaptation 
constrains the scheme to take steps which are close to those of the symplectic midpoint 
scheme, known to be advantageous for Hamiltonian problems; if the noise and damping are 
small, we expect the Hamiltonian structure to be important. 
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In both the Euler and Symplectic case, the stepping methods take the form 
(9.7) q n+l = q n + /«A„ 

Pn+l = Pn + fn 2) A n + g n ^/A^T] n+1 

Qn+l = Qn + fn 

—(2) I 

Pn+l = Pn + fn A n + 9n V ^nVn+1 

where f n and f n , g n , g n are adapted to T n -\. In this notation the metric becomes, 

In the remainder of this section, we present numerical experiments with the two methods 
just outlined. We study the qualitative approximation of the invariant measure, we quantify 
this approximation and measure its efficiency, and we study the behaviour of time-steps 
generated. 




q p 



Figure 9.3. Distribution of q,p with different r for Euler method The value 
of tolerance r is on the left of each figure. 

Figure 01 plots the numerically obtained time average of the position p and momentum 
q for various values of the tolerance r. The doted lines are the true invariant measure of 
the underling SDE which can be computed analytically in this particular case. Notice that 
the method appears stable for all values of r, in contrast to the forward Euler method 
which blows up when applied to this equation. Though apparently stable, the results are far 
from the true distribution for large r. Figure El gives the analogous plots for the adaptive 
symplectic method given in (|9.7|) . Notice that these methods seem to do a much better job 
of reproducing the invariant measure faithfully at large r. 
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Figure 9.4. Distribution of q,p with different r for the symplectic method. 
The value of tolerance r is on the left of each figure. 

It is also important to study accuracy per unit of computational effort. Figure 19*31 gives 
plots of the error in the total variation norm (the L 1 distance between the numerically 
computed time averages and the exact analytic answer verses the r used and versus the 
steps per unit of time; the latter provides a measure of unit cost. The top plots are for the 
momentum q and the bottom for the position p. The plots on the right also include two fixed 
step methods, one using simple the forward Euler scheme and the second using the first of 
the symplectic schemes. The fixed-step Euler schemes blows up for steps larger than those 
given. We make the following observations on the basis of this experiment: 

• The fixed-step symplectic method is the most efficient at small time-steps; 

• The adaptive symplectic method is considerably more efficient than the adaptive and 
fixed-step Euler methods; 

• The adaptive symplectic method is the most robust method, providing reasonable 
approximations to the invariant density for a wide range of r. 

Note that the adaptive methods have not been optimized and with careful attention might 
well beat the fixed-step methods, both as measured by accuracy per unit cost, as well as by 
robustness. Further study of this issue is required. 

10. Conclusions 

This paper proposes a simple adaptive strategy for SDEs which is designed to enforce geo- 
metric ergodicity, when it is present in the equation to be approximated; without adaptation 
methods such as explicit Euler may destroy ergodicity. As well as proving ergodicity, we also 
prove some exponential moment bounds on the numerical solution, again mimicking those 
for the SDE itself. Furthermore, we prove finite-time convergence of the numerical method; 
this is non-trivial because we do not assume (and it is not true in general) that the time-step 
sequence tends to zero with user input tolerance. It would be of interest to transfer this finite 
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Figure 9.5. Total variation error verses r for position (top left) and the 
momentum (bottom left). Total variation error verses r for steps per unit 
time (top right) and the momentum (bottom right). 

time convergence to a result concerning convergence of the invariant measures, something 
which is known for fixed time-step schemes |H2| ESI EH] ■ 

As discussed in section |H1 the scheme we study in detail here is prototypical of more 
advanced schemes comparing two more sophisticated methods and controlling both on drift 
and diffusion. Here we have mainly used simple forward Euler methods and controlled only 
on the drift: our error measure is based on the conditional means. The split-step approach 
we take allows for additional terms to be added to the error measure, to ensure that the 
diffusion step is also controlled. The general idea is to enforce the closeness of one step by 
two different methods. One has freedom in the choice of the methods and the measure of 
closeness. We now briefly mention to other error measure which make this idea specific. 

For simplicity let us assume work in one dimension though the ideas generalize directly to 
higher dimensions. The simple error control given in ()8.4|) controls only the difference in the 
expectation of one step of the two methods. However one can also use measure which ensure 
the closeness of the entire distribution of one time step of the two methods. Given x n = x n 
and A n+ i, one step of a method of the form ()8.2|) is Gaussian. Hence it is reasonable to 
require that the standard deviations are close to each other. The error criteria would then 
be 

— — \Ex n+1 - Ex n+1 \ + j= |StdDev(x w+ i) - StdDev(x n+ i)| = 

\F(x n , A n+ i) - F(x n , A n+ i)| + \ \G(x n , A n+ i)| - \G(x n , A n+1 )\ \ < r 

In some ways, comparing the mean and standard deviations is rather arbitrary. A more 
rational choice might be to ensure the closeness of the total variation distance of the densities 
after one time step of the two methods. A simple way to do this is to compare the relative 
entropy of the two distributions. Since the distributions are Gaussian this can be done 
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explicitly. One finds that the criteria based on controlling relative entropy per unit step is 
{F(x n ,A n+1 )-F(x n ,A n+1 ))> ( G(x n ,A n+1 f \ G(x n ,A n+1 f 

G(x n ,A n+1 ) 2 + yG(x n ,A n+1 y J ° g G(x n ,A n+1 y T ' 

It is interesting to note that this measure correctly captures the fact that one should measure 
the error in the drift on the scale of the variance. In other words if the variance is large, one 
does not need to be as accurate in calculating the drift as it will be washed out by the noise 
anyway. Since the above measure is expensive to calculate one can use that fact that ^ — 1 
is small to obtain the asymptotically equivalent criteria 

(F(x n ,A n+1 )-F(x n ,A n+1 )) 2 ls G(x n ,A n+1 ) 2 y 

G(x n ,A n+l ) 2 + 2\G(x n ,A n+1 ) 2 J T - 
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Appendix A. Two Exponential Martingale Estimates in Discrete Time 

Let {J-'n, n > 0} be a filtration. Let rjk be a sequence of random variables with rjk 
adapted to Tk and such that rjk+i conditioned on jF fc is normal with mean zero and variance 
a\ = ^[Vk+il^k] < oo. We define the following processes: 



1989. 



n 



n 





k=l 



n-1 



n-1 





k=0 



k=0 



As the notation suggests, (M) n and (M) n are the quadratic variation processes in that 
M% — (M) n and M% — (M) n are local martingales with respect to T n . 



Lemma A.l. Let a > and (3 > 0, then the following estimate holds: 
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If in addition a\ < a 2 G IR for all k G N almost surely, then 



P(supiV n > c) < 



P^sup (M k - -(M) k ) >(3j< e"A 
w/tere A 2 = 2a 2 + 1/cn. 

Proo f of Lemma \A . 11 We begin with the first estimate. Define N n = exp(aM n — ^~(M) n ) 
and observe that N n = ^{N^il^n}. This in turn implies that E|A^| = EiV n — Nq — 1 < 
oo. Combining these facts we see that N n is a Martingale. Hence, the Doob-Kolmogorov 
Martingale inequality implies 

EiVo _ 1 
c 

Since p(sup n (M n - f (M) n ) > /?) = P(sup n iV n > e Q/3 ), the proof is complete. 

The second estimate is obtained in the same way after some preliminary calculations. We 
define <f>(x) = |ln(l — 2x) and i/;(x,b) = —x — bx 2 . Observe that cip(x,b) = if)(cx,b/c) and 
4>(x) > i(x,b) if x G [0,|(^)] and b > 1. Now 

p(sup(M n -|(M>„)>/3 

Setting A 2 = 2a 2 + -, we have that j?ip(o- 2 , a) = ip(^,X 2 a) < 0(t|) for all k > since 
o| < o"* and A 2 a > 1. Defining 

Nn~- 

we have 

sup(M n - ^(M)n) > p) < P (supiV n > el 

Now recall that if £ is a unit Gaussian random variable then Eexp(c£ 2 ) = \j\/\ — 2c for 
c G (— |, |). By construction ^, conditioned on is a Gaussian random variable with 

variance less then |. Hence 





E(ex P (|)|^ 1 )=exp(-0(^i) 



Using this one sees that E{JV n+ i|.F n } = iV n and E|iV n | = 1 < oo, hence N n is a Martingale. 
By the same argument as before using the Doob-Kolmogorov Martingale inequality, we 
obtain the quoted result. □ 
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